Text mixing shapes the anatomy of rank-frequency distributions.

نویسندگان

  • Jake Ryland Williams
  • James P Bagrow
  • Christopher M Danforth
  • Peter Sheridan Dodds
چکیده

Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law, which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this "law" of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora since the late 1990s have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and noncore lexica. Here we present and defend an alternative hypothesis that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection, we find emphatic empirical support for the universality of our claim.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language

Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf’s law which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this ‘law’ of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora over the last 15 years have revealed the existenc...

متن کامل

Design and characterization of biodegradable polymer-clay nanocomposites prepared by solution mixing technique

This paper discusses about preparation of biodegradable polymer /clay nanocomposites based on organically modified montmorillonite clay; i.e. cloisite 10A and biodegradable polymer chitosan by solution mixing technique and their characterization. The nanocomposites were successfully prepared and their structures were characterized by powder x-ray diffraction (XRD), particle size analyzer (Beckm...

متن کامل

Design and characterization of biodegradable polymer-clay nanocomposites prepared by solution mixing technique

This paper discusses about preparation of biodegradable polymer /clay nanocomposites based on organically modified montmorillonite clay; i.e. cloisite 10A and biodegradable polymer chitosan by solution mixing technique and their characterization. The nanocomposites were successfully prepared and their structures were characterized by powder x-ray diffraction (XRD), particle size analyzer (Beckm...

متن کامل

A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine

Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...

متن کامل

نمودار شوهارت ناپارامتری رتبه علامت دار با فاصله نمونه گیری متغیر

Nonparametric control chart based on rank is used for detecting changes in median(mean). In this article ,Signed-rank control chart is considered with variable sampling interval. We compared the performance of Signed-rank with variable sampling interval (VSI-SR) to Signed-rank with Fixed Sampling interval (FSI-SR),the numerical results demonstrated the VSI feature is so useful. Bakir[1] showed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Physical review. E, Statistical, nonlinear, and soft matter physics

دوره 91 5  شماره 

صفحات  -

تاریخ انتشار 2015